Accessibility and Intractability of the Report:
::: This report contains several features which may aid with user experience, such as Alt-Text for each image.
Please note you may wish to collapse all code chunks at the start for best storytelling. Similarly, when reviewing the code chunks, the palette generally follows the color-coding: comments, functions, and values.
Any text in blue outside of code chunks contains a hyperlink or can be interacted with (try to click on the author for a link to the GitHub Repo!). :::
Perhaps surprisingly, I belong to a special group. Unlike the thousands of people around me, I (knocks on wood) do not have hay fever. Not even a tiny seasonal allergy! I have never known the pains of constant sneezing, allergy-driven insomnia, and never-ending stuffiness. Almost 10 million people in the UK would probably hate me if they knew.
As a great friend to my poor, hay-feverish friends and fellow citizens, I have decided to study antihistamine prescription trends and potential environmental relationships.
Hay fever is most commonly addressed with this group of drugs, constituting one of their most common uses. This malady is oftentimes seasonal, aligning with pollen-rich months (generally around spring). Generally, differently sensitized people will have varied reactions to the many pollen sources. However, recent research on tree allergenicity, has identified several popular urban trees to be highly allergenic to a wide proportion of their target population (Netherlands) (De Weger et al. 2024).
This report aims to answer the following questions, which are both of personal and public health interest:
Findings from this report could be of special interest for Edinburgh’s City Plan 2030, which will be heavily involved with the city’s Thriving Greenspaces Strategy.
for (package in c("tidyverse","janitor","gt","gtExtras","forcats","ggspatial","osmdata","raster","sf","data.table","scales", "leaflet", "htmltools")) {eval(bquote(library(.(package))))}The report relies on data collected by the NHS ‘Prescriptions in the Community’, focusing on JUL-2023 to JUN-2024 (dates selected for a yearly trend analysis, with up-to-day data comparison with the Tree dataset). This dataset collects details on the prescriptions sold attributed to each GP Code.
To bind these with a geographical location, the GP codes first need to be linked to their UPRN (Unique Property Reference Number) using the GP_Practices dataset, which can then be used to identify the Easting and Northing coordinates using the NSUL Dataset provided by the UK Government (FEB-2023). The Tree dataset contains the location and species of trees maintained by the Edinburgh City Council (JUN-2024). The Postal Sector dataset will be used for geographical mapping of city boundaries. The following must be downloaded and loaded onto the data folder for the report to work:
The selected antihistamines are shown below, and were obtained from the NHS Hay Fever treatment recommendations.
postal_sector <- st_read("C:/Users/Data/Downloads/GB_Postcodes/PostalSector.shp", quiet = TRUE)
gp_to_nsul = read.csv("C:/Users/Data/Downloads/GP_Practices_-_Scotland.csv")
trees = read.csv("C:/Users/Data/Downloads/Trees.csv")
nsul = read.csv("C:/Users/Data/Downloads/NSUL_FEB_2023_SC.csv")
antihistamines = c("chlorphenamine", "cinnarizine", "diphenhydramine", "hydroxyzine", "promethazine", "acrivastine", "cetirizine", "fexofenadine", "loratadine")
# To access the prescription data, I will be constructing the link dynamically. Each link was seen to have two variable components: the date and the resource number. I have collected them below, ordered in pairs matching index position.
dates = c("202307","202308","202309","202310","202311","202312","202401","202402","202403","202404","202405","202406")
resources = c("7bb45ee6-6f1c-45f4-933b-958bdbe0ca4f","72ca4ad2-0228-4672-9eb0-cc911e4a8ca7","2d3d240f-1467-4a91-9f0e-769745650cb9","94134438-db1d-4b8d-8f70-5e9b0e47bd03","21d11fed-a494-4e30-9bb6-46143c3b9530","00cdf8d7-f784-4e87-894c-34f2540ea6ab","d3eaaf84-0f3b-4fb8-9460-e33503095fbe","86fda652-0e1d-48bb-9368-aa2a560d925b","a42762ac-47cb-4fb6-b9b1-2478a588c0ed","409a02f2-b77c-47a0-917d-4a5a1c90f182","5fbbd126-6166-4249-9620-7ed78e877297","95f7f250-bd04-4e4a-b853-5df75b00a632")
for (date in dates){assign(paste0("prescriptions_", date),read_csv(paste0("https://www.opendata.nhs.scot/dataset/84393984-14e9-4b0d-a797-b288db64d088/resource/",resources[match(date,dates)],"/download/pitc",date,".csv")) %>% clean_names())} #Assign + paste0 can create objects with varying names inside a loop!
prescriptions_all = rbindlist(lapply(ls(pattern="^prescriptions"), function(x) get(x)))# Create a master df of prescriptions
remove(list = ls(pattern="^prescriptions_2")) # And I remove the individual prescription df to clear space!In the following code, the prescriptions dataset is first filtered to
produce antihistamines_data, which contains the total sales
for each antihistamine (defined above) by GP per year.
antihist_sold_by_drug.A different aggregation for total sales by drug and month:
antihist_sold_by_month_and_drug.
antihist_sold_by_month_and_drug_3.And for total antihistamine sales by GP:
antihist_sold_by_gp.
antihistamines_data = as.data.frame(c())
for (drug in antihistamines){
drug_df = prescriptions_all %>%
dplyr::filter(str_detect(tolower(bnf_item_description), drug)) %>%
group_by(gp_practice, date = paid_date_month) %>%
summarise(drug = drug, paid_quantity = sum(paid_quantity))
antihistamines_data = rbind(antihistamines_data, drug_df)}
antihist_sold_by_drug = antihistamines_data %>%
group_by(drug = str_to_title(drug)) %>%
summarize(total = sum(paid_quantity),
proportion = total * 100/ sum(antihistamines_data$paid_quantity)) %>%
arrange(desc(total))
antihist_sold_by_month_and_drug = antihistamines_data %>%
mutate(date = as.Date(paste0("01",date),format="%d%Y%m")) %>%
group_by(drug = str_to_title(drug), date) %>%
summarize(total = sum(as.integer(paid_quantity))) %>%
mutate(drug = factor(drug, levels = antihist_sold_by_drug$drug, ordered = T))
antihist_sold_by_month_and_drug_3 = antihist_sold_by_month_and_drug %>%
group_by(drug) %>%
arrange(desc(total)) %>%
top_n(3,wt = total)
antihist_sold_by_gp = antihistamines_data %>%
group_by(gp_practice) %>%
summarise(antihist_sold = sum(paid_quantity))To later plot a map of Edinburgh with districts, GP locations and their antihistamine sales, and the position of trees (both all, and specifically allergenic), the geographical datasets introduced above are to be filtered and joined.
First, a simple filter of the postal_sector data to
limit the sprawl to only ‘Edinburgh’ gives:
postal_sector_edi.
Selecting the rows of nsul whose postcode starts
with “EH”, and keeping only the NSUL code and their corresponding
coordinates produces nsul_edi.
This is then joined with a similarly filtered
gp_to_nsul object using the NSUL code, which creates a
dataset with GP codes and their coordinates. The GP code is then used to
join in the antihistamine sales each produced from
antihist_sold_by_gp, keeping only GPs for which both the
location and the sales are known. This is object
gp_edi_sf.
Finally, all_trees_sf is directly the original
trees dataset, re-formatted for geographical plotting.
allergenic_trees_sf takes the above, keeping only
the trees belonging to genus classified as “very strong” or “strong”
allergenic by De Weger et al. (2024)
(namely: Betula, Alnus and Corylus).postal_sector_edi = postal_sector %>% dplyr::filter(Sprawl == "Edinburgh") # Get the Edinburgh Districts
nsul_edi = nsul %>% dplyr::filter(str_detect(PCDS,"^EH")) %>% # Get the GP Information
rename(Easting = "GRIDGB1E",
Northing = "GRIDGB1N",
uprn = "UPRN") %>% ##u
dplyr::select(Easting, Northing, uprn)
gp_edi_sf = gp_to_nsul %>%
dplyr::filter(str_detect(postcode,"^EH")) %>% # Filter the GPs in Edinburgh
left_join(., nsul_edi, by = "uprn") %>% # Add their coordinates using UPRN
dplyr::filter(!is.na(Easting)) %>% # Remove GPs with unknown coordinates
inner_join(.,antihist_sold_by_gp, by = c("prac_code" = "gp_practice")) %>% # Add antihistamines sold, only keep GPs for which we also know the antihistamines.
st_as_sf(., coords = c("Easting", "Northing"), crs = st_crs(postal_sector_edi)) %>% # Transform into sf points based on Edi Districts and remove empty ##u
dplyr::filter(!st_is_empty(geometry))
all_trees_sf <- st_as_sf(trees, coords = c("X", "Y"), crs = st_crs(postal_sector_edi))
allergenic_trees_sf = all_trees_sf %>% filter(str_detect(tolower(trees$LatinName),"betula|alnus|corylus"))The code below produces antihistamine_sales_table, a
table ranking the selected antihistamines by total sales across Scotland
for the JUL-2023 to JUN-2024 period.
# Get a table with how many sales each antihistamine had
antihistamine_sales_table <- antihist_sold_by_drug %>%
gt() %>%
tab_header(title = md("**Antihistamine Sales in Scotland**"), subtitle = "Data from NHS Scotland[...]") %>%
cols_label(drug = md("**Antihistamine**"), total = md("**Total Sales**"), proportion = md("**Proportion of<br />Antihistamine Sales**")) %>%
data_color(columns = total, method = "numeric", palette = "viridis", reverse = TRUE, domain = c(0, max(antihist_sold_by_drug$total))) %>%
fmt_number(columns = total, sep_mark = ",", decimals = 0) %>%
fmt_percent(columns = proportion, decimals = 2) %>%
tab_options(
column_labels.border.bottom.width = px(2),
column_labels.border.bottom.color = "black",
table.border.top.color = "white"
) %>%
gt_plt_bar_pct(column = proportion, scaled = TRUE, labels = TRUE, decimals = 2) %>%
cols_width(everything() ~ px(200)) %>%
cols_align(align = "center", columns = total:proportion) %>%
opt_horizontal_padding(scale = 3)
antihistamine_sales_table| Antihistamine Sales in Scotland | ||
| Data from NHS Scotland[...] | ||
| Antihistamine | Total Sales | Proportion of Antihistamine Sales |
|---|---|---|
| Cetirizine | 44,479,233 | 33.73% |
| Fexofenadine | 32,212,106 | 24.43% |
| Chlorphenamine | 29,252,644 | 22.18% |
| Loratadine | 12,341,177 | 9.36% |
| Promethazine | 6,692,444 | 5.07% |
| Cinnarizine | 3,469,256 | 2.63% |
| Hydroxyzine | 3,058,186 | 2.32% |
| Acrivastine | 353,334 | 0.27% |
| Diphenhydramine | 19,959 | 0.02% |
The most commonly prescribed antihistamines are Cetrizine, Fexofenadine, Chlorphenamine, and Loratadine. In total, these account for almost 90% of all antihistamine prescription sales in a year. On the contrary, Promethazine, Cinnarizine, Hydroxyne, Acrivastine, and Diphenhydramine aggregated barely surpass 10%.
Although all the above are antihistamines, not all are primarily used for allergy and hay fever treatment. Interestingly, those in the lower band mostly belong to first-generation drowsy antihistamines and are generally associated with different primary uses. These include insomnia, itchiness, anxiety, vertigo, travel sickness, tinnitus, ear conditions.
Alternatively, the dosage may simply be less convenient. Acrivastine, unlike Cetirizine or Loratadine, does not have a standard dose of a single tablet per day. The above reasons are either generally consistent throughout the year, or more randomly sporadic than hay fever, so further temporal exploration may aid in supporting the explanation.
Related to this latter point, the data here has been aggregated blinded towards dosage of drugs. This likely means drugs with higher dosage frequency will be over-represented compared to one-tablet-a-day drugs. That is, the difference in preference towards Cetirizine, one of the single daily dose drugs, is likely even greater than calculated. [TBF: explain this better]. Of note, most antihistamines are available Over-The-Counter, and do not require a GP prescription. Moreover, they can be fairly affordable, with 14 Certizine one-a-day tablets costing less than £2. The high accessibility, coupled by a seasonal onset of symptoms, will likely mean a significant proportion of the population may simply opt to purchase them on-the-go at pharmacies or supermarkets, rather than pursue a formal diagnosis / prescription with their GP. Therefore, the numbers observed here are most surely not an accurate quantification of population consumption, and should be used merely as a limited insight into antihistamine preferences within regulated healthcare.
The code below produces month_trend_plot, a plot showing
the monthly sales of each antihistamine across Scotland for the JUL-2023
to JUN-2024 period. It also highlights the 3 months with most sales for
each drug.
month_trend_plot = antihist_sold_by_month_and_drug %>% ggplot(aes(x = date, y = total, group = drug, colour = drug)) +
geom_line(linewidth = 1) +
geom_point(data = antihist_sold_by_month_and_drug_3, size = 2.5) +
facet_wrap(vars(drug)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
legend.position = "none") +
scale_x_date(date_breaks = "1 month", labels = date_format("%b %Y")) +
scale_y_continuous(labels = comma) +
labs(title = "Monthly Antihistamine Sales from JUL-2023 to JUN-2024",
x = "Month",
y = "Total Sales",
caption = md("POINTS: top 3 months with most sales\nDATA: Prescriptions in the Community, Public Health Scotland"))
month_trend_plotThe code below produces trees_and_antih_map, an
interactive map showing the total antihistamine sales in Edinburgh GPs
during the JUL-2023 to JUN-2024 period. It includes a layer showing all
the trees managed by the city council in light green, and another
highlighting trees beloning to a strongly sensitizing genus in darker
green.
trees_and_antih_map <- leaflet() %>%
addProviderTiles(providers$OpenStreetMap) %>% # Add OpenStreetMap tiles as the base layer
# Add the district polygons, which will aid in boundary setting and point visualization
addPolygons(data = st_transform(postal_sector_edi, crs = 4326), # Note: st_transform aids in the conversion from Easting/Northing to coordinates leaflet can work with (CRS = 4326)
fillColor = "lightgray", fillOpacity = 0.6, color = "deeppink", weight = 0.5,popup = ~paste("District:", PostDist)) %>%
# Add light green points for all tree locations (with their species name!)
addCircleMarkers(data = st_transform(all_trees_sf, crs = 4326),
color = "forestgreen", radius = 1, stroke = FALSE, fillOpacity = 0.5, popup = ~paste("Tree Species:", LatinName), group = "All Tree Locations") %>%
# Add darker green points for allergenic tree locations (again, with their species name!)
addCircleMarkers(data = st_transform(allergenic_trees_sf, crs = 4326),
color = "darkgreen", radius = 1, stroke = FALSE, fillOpacity = 0.5, popup = ~paste("Tree Species:", LatinName), group = "Allergenic Tree Locations") %>%
# Add color-coded circles for each GP in Edinburgh reflecting antihistamines sold (with GP code and sales reported)
addCircleMarkers(data = st_transform(gp_edi_sf, crs = 4326),
color = ~colorNumeric("inferno", gp_edi_sf$antihist_sold, reverse = TRUE)(antihist_sold), radius = 5, fillOpacity = 1, stroke = FALSE, popup = ~paste("GP Code:", prac_code, "<br>", "Antihistamines Sold:", antihist_sold), group = "GP Locations") %>%
# Add title, subtitle and note
addControl(tags$div(HTML("<h4>Edinburgh: Antihistamine Prescription in GPs and Tree Location</h4><p>Prescriptions Data (gradient circles): JUL-2023 to JUN-2024; Tree Data (green points): JUN-2024.</p>")), position = "topright") %>%
addControl(tags$div(HTML("<small>Edinburgh Map from OpenStreetMap (OSM)</small>")), position = "bottomleft") %>%
# Add a legend for antihistamine sales colouring, an interactive layers pannel, and set default map view.
addLegend("bottomright", pal = colorNumeric("inferno", gp_edi_sf$antihist_sold, reverse = TRUE), values = gp_edi_sf$antihist_sold, title = "Antihistamines Sold", opacity = 1) %>%
addLayersControl(overlayGroups = c("GP Locations", "All Tree Locations", "Allergenic Tree Locations"), options = layersControlOptions(collapsed = FALSE)) %>%
setView(lng = -3.208267, lat = 55.933251, zoom = 11)
trees_and_antih_mapEdinburgh: antihistamine Prescription in GPs and Tree Location
[INTERPRETATION]
LIMITATIONS There are limitations in both the tree dataset and in the filtering steps for Edinburgh GP selection. Defining boundaries can be challenging, especially so when combining different administrative entities. ‘EH’ postcodes were used to select ‘Edinburgh’ GPs. While this is the Edinburgh postcode area, it also covers other cities. On the other hand, the Edinburgh Postal Map dataset used to draw the local boundaries only includes postcodes of the City of Edinburgh. Furthermore, the tree dataset instead opts to divide by Edinburgh Wards / Neighbourhood Partnerships (and other neighbouring cities, such as South Queensferry). This includes the “Pentland Hill” and “Almond” areas, which encompass coordinates outside-borders in the ‘stricter’ Postal Areas dataset, yet falls short compared to the wider ‘EH’-filtered dataset (GPs).
[TBF] This could be especially true for areas
As mentioned above, I intend to modify the interactiveness of the geographical plot. The work show belongs to the last section of my report. Earlier sections will have a table showing how much each antihistamine was sold in Scotland, and a line plot showing changes in antihistamines sold by month (grouped by antihistamine name, one coloured line each in the same graph).
Disclosure of GenAI Use:
ChatGPT was used to assist with generation of the background raster layer of the Geographical Plot, for which common packages and their free API allowances had changed recently.
It was also used to aid in the customization of the report’s aesthetics, namely to quickly check if an idea was feasible - which upon confirmation, I found more success forgoing the approaches it suggested and taking the informed search to Stack Overflow.
{#refs}